Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Worker communication optimization (aka removing netstring dependency) #644

Merged
merged 12 commits into from
Oct 7, 2021

Conversation

nazar-pc
Copy link
Collaborator

It is not like someone complained about it, but it bothered me quite a bit working around worker communication.

The basic argument is that we use human-readable format for pipes, where no one would be able to read the content anyway, yet we use things like number formatting for every single message going back and forth between library and worker.

This PR like others is best to be reviewed commit by commit, this way it will make the most sense.

What this PR does

  • replaces netstring serialization with simple length-prefix binary format that is easy to work with in most languages
  • unifies PayloadChannel implementation with Channel implementation in the worker (there was extra 4M buffer allocated for unknown reason)
  • fixes one bug around handling payload channel messages in Rust
  • adds Rust benchmark for sending/receiving binary messages using direct transport

Results

  • while Rust benchmark is not very consistent, it does indicate a slight performance improvement due to these changes
  • memory consumption is 4M less for each worker running due to removal of extra buffer in C++
  • there is a bit less code in Rust and it is simpler
  • there is much less code in C++ and arguably it is way more readable now with logic being simple and straightforward
  • I think there is less memory operations and definitely less CPU cycles wasted in C++ worker with these changes
  • there is 1 less dependency in TypeScript and C++ 🙂

Future work

One of the motivation factors besides those mentioned above is that I want to eventually eliminate pipes in Rust<->C++ communication and instead access each other's memory directly if possible. Having just a size prefix as bytes and data after it without netstring sprinkled in makes it slightly easier to refactor things later.

Once pipes are eliminated it should be (theoretically) possible to more or less efficiently implement custom SFU logic/extensions in Rust land by grabbing and dispatching packets using direct transport.

@nazar-pc nazar-pc requested review from ibc and jmillan August 21, 2021 06:54
@jmillan
Copy link
Member

jmillan commented Aug 23, 2021

Makes a lot of sense to me 👍

@ibc
Copy link
Member

ibc commented Aug 23, 2021

Please don't merge this yet. I have to do a proper review and there are changes I don't like. For instance the channel message size needs to be +4MB size in order to allow big stats JSON files.

@nazar-pc
Copy link
Collaborator Author

nazar-pc commented Aug 23, 2021

Sure, take your time.

This doesn't change message size to the best of my knowledge.

Also I do not fully understand why there is a size limit in the first place, it should be possible to have a vector that can grow on demand instead of fixed buffer size without noticable performance impact, in fact Rust on receiving size supports anything up to 4GB already and on TypeScript side it is artificially limited as well (we can just remove if statement and nothing will break).

@nazar-pc nazar-pc force-pushed the communication-optimization branch from 861035d to f1888b3 Compare August 25, 2021 13:58
@ibc
Copy link
Member

ibc commented Sep 10, 2021

@nazar-pc need help about how to proceed with this PR and the Meson one. This one removes netstring and the other moves it to Meson build system so obviously they are gonna conflict.

src/Channel.ts Outdated Show resolved Hide resolved
src/PayloadChannel.ts Outdated Show resolved Hide resolved
worker/src/Channel/ChannelSocket.cpp Outdated Show resolved Hide resolved
worker/src/Channel/ChannelSocket.cpp Outdated Show resolved Hide resolved
worker/src/Channel/ChannelSocket.cpp Show resolved Hide resolved
worker/src/PayloadChannel/PayloadChannelSocket.cpp Outdated Show resolved Hide resolved
src/Channel.ts Show resolved Hide resolved
@ibc
Copy link
Member

ibc commented Sep 10, 2021

@nazar-pc some cosmetic changes requested and some questions.

I'm a bit afraid of these changes because this worked reliably for years (I don't doubt about this PR). Are you testing this code already in your mediasoup setups?

Copy link
Collaborator Author

@nazar-pc nazar-pc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nazar-pc need help about how to proceed with this PR and the Meson one. This one removes netstring and the other moves it to Meson build system so obviously they are gonna conflict.

I didn't assume that all my changes will necessarily be accepted so I made two independent PRs. This one is simpler (I think) and should go first if seems reasonable. I will update Meson PR to resolve any conflicts, including nestring removal.

I'm a bit afraid of these changes because this worked reliably for years (I don't doubt about this PR). Are you testing this code already in your mediasoup setups?

I do not have a production deployment with these changes, but I have been using them for ~2 weeks in local development, various tests and benchmarks and have not found any problems so far.

To summarize, the messaging mechanism with this PR is the following:

  1. Reads
    • Try to read 4 bytes
      • If there is not enough data - do nothing and wait for more data
      • Otherwise interpret as 32-bit length for upcoming message in native endianness for the platform
    • Try to read the message according to the length in above 4 bytes
      • If there is not enough data - do nothing and wait for more data
      • Otherwise interpret as a new message and remove length and message from read buffer
    • Repeat
    • Both C++ and TypeScript have an optimization that doesn't remove the beginning of the read buffer immediately and instead reads all complete messages and does just one std::memov() at the very end, this reduces number of unnecessary memory copies
  2. Writes
    • Write 4 bytes with 32-bit length of the message in native endianness
    • Write the message itself

So while changes are invasive, would argue the algorithm in the end is simpler that the one there was in place before.

worker/src/Channel/ChannelSocket.cpp Show resolved Hide resolved
src/Channel.ts Show resolved Hide resolved
@nazar-pc nazar-pc mentioned this pull request Sep 18, 2021
@ibc
Copy link
Member

ibc commented Oct 7, 2021

Testing this

lib/Channel.js Outdated Show resolved Hide resolved
Copy link
Member

@ibc ibc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues in my tests.

@ibc ibc merged commit af15b52 into versatica:v3 Oct 7, 2021
@ibc
Copy link
Member

ibc commented Oct 7, 2021

Merged in v3 :)

@nazar-pc nazar-pc deleted the communication-optimization branch October 8, 2021 02:40
Copy link
Member

@jmillan jmillan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@GEverding
Copy link

Hi, We recently upgraded mediasoup in production and observed some really weird behaviour in direct transports and datachannels. I think i've tracked it down the the 3.9.0 release and this pr.

What we observed was crazy high memory usage and a multi minute delay of chat message being delivered only when we had 8 people connected to a room. Whats interesting is that they'd come in waves and everyone in the room would get the messages all at once then nothing. Normal usage of data channels with webrtc transports appears normals. We think the issue relates to direct transports in conjunction to webrtc transports.

Our architecture looks something like this

sender webrtc data producer -> sender direct data consumer -> sender direct data producer -> receiver direct data consumer -> recevier direct data producer -> receiver webrtc data consumer

We can repo this in our testing environment but Im not sure how to debug this behaviour.

@nazar-pc
Copy link
Collaborator Author

@GEverding your question should be posted on the forum, not in this PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

4 participants